Optimize/streamline fill operations #2395

ncruces · 2025-04-11T13:28:08Z

This optimizes/streamlines fill operations.

Backstory:

since 1.21 we have a clear builtin which is faster than copy when zeroing memory
slices.Repeat uses a copy loop very similar to the one we're using
bytes.Repeat does the same, but adds an 8KB maximum to the chunk size
the existing generated code could be streamlined

Intuition:

rare that tables are much larger than 8KB
using clear (actually runtime.memclrNoHeapPointers) from native code is harder
favor smaller/simpler code for tables, faster code for memory (like the standard library)

For the compiler:

streamline/simplify table.fill
optimize memory.fill using the 8KB maximum chunk size

For the interpreter:

keep table.fill unchanged
optimize memory.fill with both clear and maximum chunk size

For `table.Grow`:

recognize that zero initialization is already done

Results:

I could measure an over 2x improvement filling megabytes of memory, with no degradation in performance at small sizes. No increase of generated code size.

Future work:

Using memory.fill to zero memory is very common. If we could access runtime.memclrNoHeapPointers in the compiler, we could potentially gain another 25% there (especially if we optimized this when zero is known at compile time).

Signed-off-by: Nuno Cruces <ncruces@users.noreply.github.com>

mathetake

Fantastic

ncruces · 2025-04-12T15:55:41Z

That was way faster than I expected. 😂
I kinda expected this to be torn to pieces.

It has since occured to me that ((i - 1) & 8191) + 1 might not be better than min(i, 8192). Wdyt?

I know very little of the back ends to know what generates better code. “Branches are bad, conditional moves not so much” is the advanced-ness of my understanding.

ncruces added 2 commits April 11, 2025 14:01

Simplify fill.

3a056c7

Signed-off-by: Nuno Cruces <ncruces@users.noreply.github.com>

Faster fill.

2584cce

Signed-off-by: Nuno Cruces <ncruces@users.noreply.github.com>

ncruces requested a review from mathetake as a code owner April 11, 2025 13:28

mathetake approved these changes Apr 12, 2025

View reviewed changes

mathetake merged commit 242ae91 into tetratelabs:main Apr 12, 2025
50 checks passed

ncruces mentioned this pull request Apr 14, 2025

Fix typo in memory fill. #2396

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize/streamline fill operations #2395

Optimize/streamline fill operations #2395

Uh oh!

ncruces commented Apr 11, 2025

Uh oh!

mathetake left a comment

Uh oh!

Uh oh!

ncruces commented Apr 12, 2025

Uh oh!

Uh oh!

Optimize/streamline fill operations #2395

Optimize/streamline fill operations #2395

Uh oh!

Conversation

ncruces commented Apr 11, 2025

Backstory:

Intuition:

For the compiler:

For the interpreter:

For table.Grow:

Results:

Future work:

Uh oh!

mathetake left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ncruces commented Apr 12, 2025

Uh oh!

Uh oh!

For `table.Grow`: